Decision-theoretic planning is a popular approach to sequential decisionmaking problems, because it treats uncertainty in sensing and acting in aprincipled way. In single-agent frameworks like MDPs and POMDPs, planning canbe carried out by resorting to Q-value functions: an optimal Q-value functionQ* is computed in a recursive manner by dynamic programming, and then anoptimal policy is extracted from Q*. In this paper we study whether similarQ-value functions can be defined for decentralized POMDP models (Dec-POMDPs),and how policies can be extracted from such value functions. We define twoforms of the optimal Q-value function for Dec-POMDPs: one that gives anormative description as the Q-value function of an optimal pure joint policyand another one that is sequentially rational and thus gives a recipe forcomputation. This computation, however, is infeasible for all but the smallestproblems. Therefore, we analyze various approximate Q-value functions thatallow for efficient computation. We describe how they relate, and we prove thatthey all provide an upper bound to the optimal Q-value function Q*. Finally,unifying some previous approaches for solving Dec-POMDPs, we describe a familyof algorithms for extracting policies from such Q-value functions, and performan experimental evaluation on existing test problems, including a newfirefighting benchmark problem.
展开▼